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Abstract 

The aim of this paper is to establish a global asymptotic equivalence between 
the experiments generated by the discrete (high frequency) or continuous ob¬ 
servation of a path of a Levy process and a Gaussian white noise experiment 
observed up to a time T, with T tending to oo. These approximations are 
given in the sense of the Le Cam distance, under some smoothness condi¬ 
tions on the unknown Levy density. All the asymptotic equivalences are 
established by constructing explicit Markov kernels that can be used to re¬ 
produce one experiment from the other. 
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1. Introduction 

Levy processes are a fundamental tool in modelling situations, like the 
dynamics of asset prices and weather measurements, where sudden changes 
in values may happen. For that reason they are widely employed, among 
many other fields, in mathematical finance. To name a simple example, the 
price of a commodity at time t is commonly given as an exponential function 
of a Levy process. In general, exponential Levy models are proposed for their 
ability to take into account several empirical features observed in the returns 
of assets such as heavy tails, high-kurtosis and asymmetry (see [15] for an 
introduction to financial applications). 

From a mathematical point of view, Levy processes are a natural exten¬ 
sion of the Brownian motion which preserves the tractable statistical prop¬ 
erties of its increments, while relaxing the continuity of paths. The jump 
dynamics of a Levy process is dictated by its Levy density, say /. If / is 
continuous, its value at a point xq determines how frequent jumps of size 



close to xq are to occur per unit time. Concretely, if X is a pure jump Levy 
process with Levy density /, then the function / is such that 


I f(x)dx = -E 




L s<t 


for any Borel set A and t > 0. Here, AX S = X s — X s - denotes the magnitude 
of the jump of X at time s and I a is the characteristic function. Thus, the 
Levy measure 

v( A ) '■= [ f( x ) dx , 

J A 

is the average number of jumps (per unit time) whose magnitudes fall in the 
set A. Understanding the jumps behavior, therefore requires to estimate the 
Levy measure. Several recent works have treated this problem, see e.g. [2] 
for an overview. 

When the available data consists of the whole trajectory of the process 
during a time interval [0,T], the problem of estimating / may be reduced 
to estimating the intensity function of an inhomogeneous Poisson process 
(see, e.g. [23, 42]). However, a continuous-time sampling is never available 
in practice and thus the relevant problem is that of estimating / based on 
discrete sample data X to ,..., X tn during a time interval [0, T n \. In that case, 
the jumps are latent (unobservable) variables and that clearly adds to the 
difficulty of the problem. From now on we will place ourselves in a high- 
frequency setting, that is we assume that the sampling interval A n = fy-ti-i 
tends to zero as n goes to infinity. Such a high-frequency based statistical 
approach has played a central role in the recent literature on nonparametric 
estimation for Levy processes (see e.g. [22, 13, 14, 1, 19]). Moreover, in order 
to make consistent estimation possible, we will also ask the observation time 
T n to tend to infinity in order to allow the identification of the jump part in 
the limit. 

Our aim is to prove that, under suitable hypotheses, estimating the Levy 
density / is equivalent to estimating the drift of an adequate Gaussian white 
noise model. In general, asymptotic equivalence results for statistical exper¬ 
iments provide a deeper understanding of statistical problems and allow to 
single out their main features. The idea is to pass via asymptotic equivalence 
to another experiment which is easier to analyze. By definition, two sequences 
of experiments and PA-i jfl , defined on possibly different sample spaces, 
but with the same parameter set, are asymptotically equivalent if the Le 
Cam distance A(«£^ 1)Tl , ^ 2 ,n) tends to zero. For = (^, [P i e : 0 e 0)), 
i — 1,2, A(^i, ^ 2 ) is the symmetrization of the deficiency <5(^i, AA-i) where 

<5(^i, ^ 2 ) = inf sup \\KP he - P 2y e\\ TV - 
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Here the infimum is taken over all randomizations from (<%i, srf [) to (^ 2 , ^ 2 ) 
and || • Htv denotes the total variation distance. Roughly speaking, the Le 
Cam distance quantifies how much one fails to reconstruct (with the help of a 
randomization) a model from the other one and vice versa. Therefore, we say 
that A(^i, ^ 2 ) = 0 can be interpreted as “the models and contain 
the same amount of information about the parameter d.” The general defi¬ 
nition of randomization is quite involved but, in the most frequent examples 
(namely when the sample spaces are Polish and the experiments dominated), 
it reduces to that of a Markov kernel. One of the most important feature of 
the Le Cam distance is that it can be also interpreted in terms of statistical 
decision theory (see [32, 33]; a short review is presented in the Appendix). 
As a consequence, saying that two statistical models are equivalent means 
that any statistical inference procedure can be transferred from one model to 
the other in such a way that the asymptotic risk remains the same, at least 
for bounded loss functions. Also, as soon as two models, ^i, n and 7^2 ,n, that 
share the same parameter space 0 are proved to be asymptotically equiv¬ 
alent, the same result automatically holds for the restrictions of both 
and & 2 ,n to a smaller subclass of 0. 

Historically, the first results of asymptotic equivalence in a nonparametric 
context date from 1996 and are due to [5] and [39]. The first two authors 
have shown the asymptotic equivalence of nonparametric regression and a 
Gaussian white noise model while the third one those of density estimation 
and white noise. Over the years many generalizations of these results have 
been proposed such as [3, 28, 43, 11, 10, 40, 12, 37, 46] for nonparametric 
regression or [9, 31, 4] for nonparametric density estimation models. Another 
very active field of study is that of diffusion experiments. The first result 
of equivalence between diffusion models and Euler scheme was established 
in 1998, see [38]. In later papers generalizations of this result have been 
considered (see [24, 35]). Among others we can also cite equivalence results 
for generalized linear models [27], time series [29, 38], diffusion models [18, 25, 
16, 17], GARCH model [7], functional linear regression [36], spectral density 
estimation [26] and volatility estimation [41], Negative results are somewhat 
harder to come by; the most notable among them are [20, 6, 48]. There is 
however a lack of equivalence results concerning processes with jumps. A first 
result in this sense is [34] in which global asymptotic equivalences between the 
experiments generated by the discrete or continuous observation of a path of 
a Levy process and a Gaussian white noise experiment are established. More 
precisely, in that paper, we have shown that estimating the drift function 
h from a continuously or discretely (high frequency) time inhomogeneous 
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jump-diffusion process: 


ft ft N t 

X t = / h(s)ds + / a(s)dW 3 + Y t Y i , t e [0,T n ], (1) 

Jo Jo )=l 

is asymptotically equivalent to estimate h in the Gaussian model: 

dy t = h(t)dt + a(t)dW t , t G [0, T n ]. 

Here we try to push the analysis further and we focus on the case in 
which the considered parameter is the Levy density and X = (X t ) is a 
pure jump Levy process (see [8] for the interest of such a class of processes 
when modelling asset returns). More in details, we consider the problem of 
estimating the Levy density (with respect to a fixed, possibly infinite, Levy 
measure vq concentrated on / C M) / := : / —» M from a continuously 

or discretely observed pure jump Levy process X with possibly infinite Levy 
measure. Here /CM denotes a possibly infinite interval and u 0 is supposed 
to be absolutely continuous with respect to Lebesgue with a strictly positive 
density g := Jyp- In the case where v is of finite variation one may write: 

X t = Y, ( 2 ) 

0 <s<t 

or, equivalently, X has a characteristic function given by: 




We suppose that the function / belongs to some a priori set non- 
parametric in general. The discrete observations are of the form X ti , where 
ti = T n -, i = 0,..., n with T n = nA n —> oo and A n —>■ 0 as n goes to infinity. 
We will denote by the statistical model associated with the continuous 
observation of a trajectory of X until time T n (which is supposed to go to 
infinity as n goes to infinity) and by £2^ the one associated with the obser¬ 
vation of the discrete data (/Q,)" =0 . The aim of this paper is to prove that, 
under adequate hypotheses on & (for example, / must be bounded away 
from zero and infinity; see Section 2.1 for a complete definition), the models 
and J2"° are both asymptotically equivalent to a sequence of Gaussian 
white noise models of the form: 


dyt 



1 dW t 

UTuTWY 


tel. 
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As a corollary, we then get the asymptotic equivalence between 0*^ and 
The main results are precisely stated as Theorems 2.5 and 2.6. A par¬ 
ticular case of special interest arises when X is a compound Poisson process, 
u 0 = Leb([0,1]) and JF C K K where, for hxed 7 G (0,1] and K, k, M 

strictly positive constants, K K is a class of continuously differentiable 

functions on / defined as follows: 

= {/ : K < }(x) < M, I f(x) - f(y) < K\x - y] 1 , Vx,y e /}. 

( 3 ) 

In this case, the statistical models and are both equivalent to the 
Gaussian white noise model: 

dyt = \fW) dt + ^\=dW t , t G [0,1], 

See Example 3.1 for more details. By a theorem of Brown and Low in [5], 
we obtain, a posteriori, an asymptotic equivalence with the regression model 

V = '\//(T) + ^7g£ j , «i~^(0,i). * = 1.[r„]- 

Note that a similar form of a Gaussian shift was found to be asymptotically 
equivalent to a nonparametric density estimation experiment, see [39]. Let 
us mention that we also treat some explicit examples where u 0 is neither 
finite nor compactly-supported (see Examples 3.2 and 3.3). 

Without entering into any detail, we remark here that the methods are 
very different from those in [34], In particular, since / belongs to the discon¬ 
tinuous part of a Levy process, rather then its continuous part, the Girsanov- 
type changes of measure are irrelevant here. We thus need new instruments, 
like the Esscher changes of measure. 

Our proof is based on the construction, for any given Levy measure u, 
of two adequate approximations v m and v m of w. the idea of discretizing 
the Levy density already appeared in an earlier work with P. Etore and 
S. Louhichi, [21]. The present work is also inspired by the papers [9] (for 
a multinomial approximation), [4] (for passing from independent Poisson 
variables to independent normal random variables) and [34] (for a Bernoulli 
approximation). This method allows us to construct explicit Markov kernels 
that lead from one model to the other; these may be applied in practice to 
transfer minimax estimators. 

The paper is organized as follows: Sections 2.1 and 2.2 are devoted to 
make the parameter space and the considered statistical experiments precise. 
The main results are given in Section 2.3, followed by Section 3 in which some 
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examples can be found. The proofs are postponed to Section 4. The paper 
includes an Appendix recalling the definition and some useful properties of 
the Le Cam distance as well as of Levy processes. 


2. Assumptions and main results 

2.1. The parameter space 

Consider a (possibly infinite) Levy measure u 0 concentrated on a possibly 
infinite interval / Cl, admitting a density g > 0 with respect to Lebesgue. 
The parameter space of the experiments we are concerned with is a class of 
functions & = defined on / that form a class of Levy densities with 

respect to is 0 : For each / e let v (resp. z> m ) be the Levy measure having 
/ (resp. f m ) as a density with respect to u 0 where, for every / e f m (x) 
is defined as follows. 

Suppose first x > 0. Given a positive integer depending on n, m = m n , 
let Jj := (vj-i,Vj] where v\ = £ m > 0 and v 3 are chosen in such a way that 


Tm ■ MJj) 


z/ 0 ((/\ [o, e m ]) n M+) 

m — 1 


Vj = 2 ,..., m. 


(4) 


In the sequel, for the sake of brevity, we will only write m without making 

fj. xuo{dx) 

explicit the dependence on n. Dehne x* := - J -j i -and introduce a sequence 

of functions 0 < Vj < j = 2supported on [x|_ 1 ,x| +1 ] if j = 

3,..., m — 1, on [e m , xQ if j — 2 and on (/ \ [0, x* m _^\) D M + if j = m. The 
Vj s are defined recursively in the following way. 

• V 2 is equal to — on the interval and on the interval (x%,xV\ 

it is chosen so that it is continuous (in particular, = 7 -), 

$l{ V 2 (y)v 0 (dy) = and V 2 (x%) = 0 . 

• For 7 = 3,... pm — 1 dehne V as the function — - V-i on the in- 

terval [x*_ 1 ,x*]. On [x*,x* +1 ] choose Vj continuous and such that 

fx * +1 Vj(y)u 0 (dy) = Vo({ l j ^ 3]) and Vj(x* j+1 ) = 0. 

• Finally, let V m be the function supported on (/ \ [0, x* m _]\) D M+ such 
that 

V m {x) = — - G m _i(x), for x e [x*^, x* m ], 

l^m 

v m (x) = —, for X e (/ \ [0, x* m ]) n M+. 

f^m 
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(It is immediate to check that such a choice is always possible). Observe 
that, by construction, 

m „ 

y= 1, Vx <E (/\[0,£ m ])flM + and / Vj(y)u 0 (dy ) = 1. 

7 — 2 ^ (-^"\ [0,eTrji])n 3 R.-|- 


Analogously, define /x“ = ) anc [ j_ m? ... ? J_ 2 such that 

u 0 (J_j) = /i m for all j. Then, for x < 0, x* ■ is defined as x* by using J_j and 
fi m instead of Jj and /i m and the Vl/s are defined with the same procedure 
as the Vj' s, starting from V -2 and proceeding by induction. 

Define 


fm(x) = I|- em , e „](x) + fv/(x) J f(y)v„(dy) + V-,(x) J f{y)vo{dy)\. 

(5) 

The definitions of the Id ’s above are modeled on the following example: 


Example 2.1. Let uq be the Lebesgue measure on [0,1] and e m = 0. Then 
v j = and x* = lm- 2 , j = 2,..., m. The standard choice for V 3 (based on 
the construction by [9]) is given by the piecewise linear functions interpolating 
the values in the points x* specified above: 



rate of convergence of the L -2 norm between the restriction of / and f m on 
I \ [—£ m ,£ m ] is compatible with the rate of convergence of the other quanti¬ 
ties appearing in the statements of Theorems 2.5 and 2.6. For that reason, 
as in [9], we have not chosen a piecewise constant approximation of / but 
an approximation that is, at least in the simplest cases, a piecewise linear 
approximation of /. Such a choice allows us to gain an order of magnitude 
on the convergence rate of ||/ — fm\\L 2 (u 0 \i\[--e m ,£ rri ]) at least when & is a class 
of sufficiently smooth functions. 

We now explain the assumptions we will need to make on the parameter 
/ G & — . The superscripts and / will be suppressed whenever this 

can lead to no confusion. We require that: 
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(HI) There exist constants n, M > 0 such that k < f(y ) < M, for all y G / 
and / G &. 


For every integer m = m n , we can consider \/J m , the approximation of 
constructed as f m above, i.e. \ff m {x) = + Vj{x) / y/f(y)Mdy), 

jV-1,0,1. 


and introduce the quantities: 


^m(/) ■= I r , (Vfm(y) ~ VJIy)) Mdy), 

J I\[-£m,£m\ ' ' 

B 2 M) ■= ( I ^TTT \ V °W “ i 

V VM J j) ) 

i±- 1 , 0 , 1 - 

c m(f) : = [ (vTW - !) 2 ^o(^)- 

£ra 


The conditions defining the parameter space are expressed by asking that 
the quantities introduced above converge quickly enough to zero. To state 
the assumptions of Theorem 2.5 precisely, we will assume the existence of 
sequences of discretizations m = m n —* oo, of positive numbers £ m = £ mn —* 
0 and of functions V}, j = ±2 ,..., ±m, such that: 

(Cl) lim nA n sup [ (f(x) - f m (x)) v 0 (dx) = 0. 

n^°o /eJ ? J A( _ £mi£m) V / 

(C2) lim nA n sup (H^(/) + B 2 m {f ) + C^(/)) = 0. 

n—>• oo fsz <&■ 


Remark in particular that Condition (C2) implies the following: 

(H2) sup [(y/f(y) - 1 ) 2 Mdy) < L, 

fe&Ji 


where L = sup /eJr fl™ m (y/f(x) ~ l) 2 v 0 (dx) + {y/M +1) 2 ^(/\ (-£ m , £ m )), for 
any choice of m such that the quantity in the limit appearing in Condition 
(C2) is finite. 

Theorem 2.6 has slightly stronger hypotheses, defining possibly smaller 
parameter spaces: We will assume the existence of sequences m n , e m and V), 
j = ±2,..., ±m (possibly different from the ones above) such that Condition 
(Cl) is verified and the following stronger version of Condition (C2) holds: 



= 0 . 



Finally, some of our results have a more explicit statement under the 
hypothesis of finite variation which we state as: 

(FV) /jd^l A l)uo(dx) < oo. 

Remark 2.3. The Condition (Cl) and those involving the quantities A m (f ) 
and B m (f) all concern similar but slightly different approximations of /. 
In concrete examples, they may all be expected to have the same rate of 
convergence but to keep the greatest generality we preferred to state them 
separately. On the other hand, conditions on the quantity C m (f) are purely 
local around zero, requiring the parameters / to converge quickly enough to 
1 . 


Examples 2.4. To get a grasp on Conditions (Cl), (C2) we analyze here 
three different examples according to the different behavior of z / 0 near 0 G I. 
In all of these cases the parameter space jF 1 ' 0,7 will be a subclass of K K 
defined as in (3). Recall that the conditions (Cl), (C2) and (C2’) depend 
on the choice of sequences m n , z rn and functions V r For the first two of the 
three examples, where / = [ 0 , 1 ], we will make the standard choice for V) 
of triangular and trapezoidal functions, similarly to those in Example 2.1. 
Namely, for j — 3,..., rn — 1 we have 


Vj(x) — I(x^_ 1 ,x*] ( x ) 


x — x* i 1 x *,, — x 1 

--^ — + I(*VC] * -y —; (6) 

rp* _ /p* I, v J ’ J + 1 J v 7 /y>* • • V 7 

j j _^ 4 


°j +1 


00j fi Tl 


the two extremal functions V 2 and V m are chosen so that V 2 = — on (e m , xX] 
and V m = A- on {x* m ,l]- In the second example, where u 0 is infinite, one 
is forced to take £ m > 0 and to keep in mind that the x* are not uniformly 
distributed on [e m , 1], Proofs of all the statements here can be found in 
Section 5.2. 

1. The finite case: v 0 = Leb([0,l]). 

In this case we are free to choose K M y Indeed, as u 0 

is finite, there is no need to single out the first interval J\ = [0, £ m ],. so that 
C m (f) does not enter in the proofs and the definitions of A m (f ) and B m (f ) 
involve integrals on the whole of [0,1]. Also, the choice of the V /s as in (6) 
guarantees that f Q Vj(x)dx = 1. Then, the quantities ||/ —/ m ||z, 2 ([o,i]), A m (f) 
and B m (f ) all have the same rate of convergence, which is given by: 



(j{x) - u 0 (dx) + A m (f) + B m (f) 


O 


m 7 1 + rn 


uniformly on /. See Section 5.2 for a proof. 
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2. The finite variation case: ^\{x) — x 1 I[ 01 ](a;). 

In this case, the parameter space is a proper subset of 

Indeed, as we are obliged to choose e m > 0, we also need to impose that 
C m (f) = o( B ^_ ), with uniform constants with respect to /, that is, that 
all f G & converge to 1 quickly enough as x —* 0. Choosing £ rn = m _1 ~“, 

i ( -i'l i . \ 

a > 0 we have that fi m = ^ ’ , v 3 = a™ -1 and x* = — ^ 3 ~ 1 . In particular, 

rnaXj |uj_i — Uj| = \v m — u m _i| = Also in this case one can prove 

that the standard choice of Vj described above leads to J} V 3 (x) ^ = 1. 

Again, the quantities ||/ - fm\\L 2 (u 0 \i\[o, £rn \), An(/) and B m{f ) have the same 
rate of convergence given by: 

J (/W-/«W) l'o(dx) + A m(f) + B m (f) = 

(7) 

uniformly on /. The condition on C m (f) depends on the behavior of / near 
0. For example, it is ensured if one considers a parametric family of the form 
f(x ) = e~ Xx with a bounded A > 0. See Section 5.2 for a proof. 

3. The infinite variation, non-compactly supported case: ^\{x) = 

This example involves significantly more computations than the preceding 
ones, since the classical triangular choice for the functions Vj would not have 
integral equal to 1 (with respect to u 0 ), and the support is not compact. 
The parameter space j^ojttoo) can s tih |~, e c i 10 sen as a proper subclass of 
^(jKk M ), a € a i n by imposing that C m (f ) converges to zero quickly enough 
(more details about this condition are discussed in Example 3.3). We divide 
the interval [ 0 , oo) in m intervals Jj = \vj-i, Vj) with: 

£ m (m - 1) 1 

^0 0, V\ ^rri) Vj . , V m OO, /i m . 

m-j s m {m-l) 

To deal with the non-compactness problem, we choose some “horizon” H{m) 
that goes to infinity slowly enough as m goes to infinity and we bound the 
L -2 distance between / and f m for x > H(m ) by 2 sup . . We have: 

x>H(m) 

11/ - /mllLwiAIO^]) + A ’nX) + BlU) = O ^ ™P m| ) ' 

In the general case where the best estimate for sup f(x) 2 is simply given by 

x>H(m ) 
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M 2 , an optimal choice for H(m) is ^/e m rn, that gives a rate of convergence: 

11 / - fmWl 2 (u 0 \I\[0,e m ]) + A m(f) + B m(f) = 0 ^ ’ 

independently of 7 . See Section 5.2 for a proof. 


2.2. Definition of the experiments 

Let (x t ) t >0 be the canonical process on the Skorokhod space ( D , S>) and 
denote by pdflx) the law induced on ( D , S>) by a Levy process with charac¬ 
teristic triplet (6,0,i/). We will write pd >, °'^ for the restriction of pd>$F to 
the a -algebra 3> t generated by {07 : 0 < s < t} (see Appendix A .2 for the 
precise definitions). Let Q[ b ' 0,u ' > be the marginal law at time t of a Levy pro¬ 
cess with characteristic triplet ( 6 , 0 , u). In the case where J^ Kl \yW(dy) < 00 
we introduce the notation 7 " := yv{dy)\ then, Condition (H2) guaran¬ 
tees the finiteness of y 1 ' -1 ' 0 (see Remark 33.3 in [44] for more details). 

Recall that we introduced the discretization R — T n L of [0, T n ] and denote 
by Qi 7 °’°’* /) the laws of the n + 1 marginals of (x t )t >0 at times t t , i — 
0,...,n. We will consider the following statistical models, depending on 
a fixed, possibly infinite, Levy measure uq concentrated on / (clearly, the 
models with the subscript FV are meaningful only under the assumption 
(FV)): 

K°fv = (A ®r„, {P^ M ■ f := , 

9? = ( A ®t„, ^° 1 })• 

= (r" +1 , »(r” +1 ), . 


Finally, let us introduce the Gaussian white noise model that will appear in 
the statement of our main results. For that, let us denote by (G(/), < ^) the 
space of continuous mappings from / into M endowed with its standard fil¬ 
tration, by g the density of uq with respect to the Lebesgue measure. We will 
require g > 0 and let W{ be the law induced on (G(/), < ^) by the stochastic 
process satisfying: 


dyt = V f(t)dt + 


dW t 


tel, 


( 8 ) 
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where (Wfite r denotes a Brownian motion on M with Wq = 0. Then we set: 


Observe that when uq is a hnite Levy measure, then is equivalent to 
the statistical model associated with the continuous observation of a process 
(yt)tei defined by: 


dyt = y/fit)g(t)dt + t G I. 

^ V -L n 


2.3. Main results 

Using the notation introduced in Section 2.1, we now state our main 
results. For brevity of notation, we will denote by //(/, f m ) (resp. L 2 (/, f m )) 
the Hcllinger distance (resp. the L 2 distance) between the Levy measures v 
and u m restricted to I \ [—£ m , £ m ], i.e.: 


■ = 

L 2 (/,/ m ) 2 : = 




Vf( x ) - y/Uxj) M dx ), 


I\[ £m<>£m\ 


( f(y ) - f m (y)) 2 M d y)- 


Observe that Condition (HI) implies (see Lemma 5.1) 

Theorem 2.5. Let u 0 be a known Levy measure concentrated on a (possibly 
infinite) interval /CM and having strictly positive density with respect to 
the Lebesgue measure. Let us choose a parameter space J ^ 0,7 such that there 
exist a sequence m = m n of integers, functions V 3 , j = ±2 ,..., ±m and a 
sequence e m > 0 as m —> oo such that Conditions (HI), (Cl), (C2) are 
satisfied for ^ = J^ 0,7 . Then, for n big enough we have: 


A(^<\#7) = o(V^n sup (A m (f)+B m (f)+C m {f) 
\ \ 




+ O ( yjnA n sup L 2 (/, f m ) + 
V fe* 


m 


1 l 

- + -))• (9) 


Tl/\ n ' fA> 


Theorem 2.6. Let u 0 be a known Levy measure concentrated on a (possibly 
infinite) interval I CM and having strictly positive density with respect to 
the Lebesgue measure. Let us choose a parameter space jF 1 ' 0,7 such that there 
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exist a sequence m = m n of integers, functions V 3 , j = ±2,..., ±m and a 
sequence £ m —>• 0 as m —> oo such that Conditions (HI), (Cl), (C2’) are 
satisfied for & = J^ 0,7 . Then, for n big enough we have: 



Corollary 2.7. Let u 0 be as above and let us choose a parameter space J^’ 1 ' 0,7 
so that there exist sequences m! n , e' m , V- and m!' n , e" m , V" such that: 

• Conditions (HI), (Cl) and (C2) hold for m' n , e' m , V', and -—f- 

j TiL\ n y 

—b- ) tends to zero. 

• Conditions (HI), (Cl) and (C2’) hold for rn”, e" m , V)', and u 0 (j \ 

[— e m"i £ m"]^ y/nA\ + tends to zero. 

Then the statistical models TPff and £}() are asymptotically equivalent: 

lim A(^°,^°) = 0, 

n—>oo 

If, in addition, the Levy measures have finite variation, i.e. if we assume 
(FV), then the same results hold replacing CPff and £}(() by and 

Alffpv , respectively (see Lemma Appendix A. 14). 

3. Examples 

We will now analyze three different examples, underlining the different 
behaviors of the Levy measure uq (respectively, finite, infinite with finite vari¬ 
ation and infinite with infinite variation). The three chosen Levy measures 
are I[ 0 ,i ](x)dx, I[o,i](^)^ and Ir + (x)^|. In all three cases we assume the pa¬ 
rameter / to be uniformly bounded and with uniformly y-Holder derivatives: 
We will describe adequate subclasses J^ - " 0,7 C j£^ 7 A - kM \ defined as in (3). It 
seems very likely that the same results that are highlighted in these exam¬ 
ples hold true for more general Levy measures; however, we limit ourselves 
to these examples in order to be able to explicitly compute the quantities 
involved (vj, x*, etc.) and hence estimate the distance between / and f m as 
in Examples 2.4. 
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In the first of the three examples, where vq is the Lebesgue measure 
on / = [0,1], we are considering the statistical models associated with the 
discrete and continuous observation of a compound Poisson process with Levy 
density /. Observe that #^ Leb reduces to the statistical model associated with 
the continuous observation of a trajectory from: 

dyt = \fW)dt + ^ =dW t , t G [0,1], 

In this case we have: 


Example 3.1. (Finite Levy measure). Let uq be the Lebesgue measure on 
/ = [0,1] and let & = ,j^ Leb >[°T be any subclass of K for some strictly 
positive constants K , k, M and 7 G (0,1]. Then: 


lim A 


/■Leb\ 


= 0 and 


lim A(i2 

n—>00 


Leb 

n,FVi 



= 0. 


More precisely, 


A(&> 


Leb 

n,FV> 



C>((nA n ) 4 + 2 -,^ if 7 G (0, |], 
0^(nA n )-^ if 7 G (|, 1]. 


In the case where A n = n 13 , \ < /3 < 1, an upper bound for the rate of 
convergence of A(^^ v , #^ Leb ) is 


A(J2; 


n,FV ’ " n 


= < 


f _ 7+/3 

(71 n 4 + 2 -y In n 
O \nh~P In 

/ 9/14-1 


Ofn-^ln 

n 9 P In n 


n 


if 7 G (0, |) and §±fj < p < 1, 
if 7 G (0, |) and | < 0 < f±g, 
if 7 G [|, l] and | < /3 < 1, 
if 7 G [|, l] and | < /3 < §. 


See Section 5.3 for a proof. 


Example 3.2. (Infinite Levy measure with finite variation). Let X be a 
truncated Gamma process with (infinite) Levy measure of the form: 


r p - a® 

v{A) = / - dx, A E &([ 0,1]). 

J A x 

Here is a 1-dimensional parametric family in A, assuming that there 

exists a known constant Ao such that 0 < A < Ao < 00, f(t) = e~ xt and 
dvo(x) = \dx. In particular, the / are Lipschitz, i.e. c kk,my 
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The discrete or continuous observation (up to time T n ) of X are asymptoti¬ 
cally equivalent to the statistical model associated with the observation 
of a trajectory of the process (y t ): 


dyt = \fjifydt + , t e [0,1]. 

^ V -L n 

More precisely, in the case where A n = n~P, \ < f3 < 1, an upper bound for 
the rate of convergence of A(j2"° FV , ^f°) is 


A(«S 


t'O 

n,FVi 


C°) 


O (n 2 P In n) if | < /? < ^ 

Inn) if ^ < 1. 


Concerning the continuous setting we have: 

A(^D=o(^(lnn) 1 ) =0(T,^(lnT n ) § ). 
See Section 5.4 for a proof. 


Example 3.3. (Infinite Levy measure, infinite variation). Let X be a pure 
jump Levy process with infinite Levy measure of the form: 

/» r) — \x^ 

v(A) = / ^ e 0 - dx, A e ^(R+). 

Ja x 1 


Again, we are considering a parametric family in A > 0, assuming that 
the parameter stays bounded below a known constant Ao- Here, f(t) = 
2 — e~ xt3 , hence 1 < fit) < 2, for all t > 0, and / is Lipschitz, i.e. J£’ J ' 0,R+ C 
kk m)- The discrete or continuous observations (up to time T n ) of X 
are asymptotically equivalent to the statistical model associated with the 
observation of a trajectory of the process (y t ): 


dyt = s/f{t)dt + 


tdW+ 


t > 0. 


2 VTn 

More precisely, in the case where A n = n~P, 0 < j3 < L, an upper bound for 
the rate of convergence of A (, W r " 0 ) is 

if I < 0 < if 

0(n _ s + w (lnn)s) if < /3 < 1. 

In the continuous setting, we have 


A (T,r) = 


A(^°,F;°) = 0(71^(Inn)®) = 0(T n 34 (InT n )®). 


See Section 5.5 for a proof. 


15 



4. Proofs of the main results 


In order to simplify notations, the proofs will be presented in the case 
/ C M + . Nevertheless, this allows us to present all the main difficulties, 
since they can only appear near 0. To prove Theorems 2.5 and 2.6 we need 
to introduce several intermediate statistical models. In that regard, let us 
denote by Qj the law of a Poisson random variable with mean T n v(Jj) (see 
(4) for the definition of Jj). We will denote by the statistical model 
associated with the family of probabilities { <S)'" =2 Qj ■ f £ 



By Nj we mean the law of a Gaussian random variable ,/T (2 sjT n y{Jf), 1) 
and by JVm the statistical model associated with the family of probabilities 



For each / G JF, let v m be the measure having f m as a density with 
respect to is 0 where, for every / G f m is defined as follows. 


fm(x) 


1 

"(Jj) 
”0 (Jj) 


if x e J \, 

if x G Jj, j — 2 ,..., m. 


Furthermore, define 


00 V 0 
^ n 



[p{'1 Dm ~ V 0,0,Pm) 
1 T n 


&Vm 
dh>Q 



(13) 


(14) 


4-1. Proof of Theorem 2.5 

We begin by a series of lemmas that will be needed in the proof. Before 
doing so, let us underline the scheme of the proof. We recall that the goal 
is to prove that estimating / = from the continuous observation of a 
Levy process (W)te[o,T n ] without Gaussian part and having Levy measure v 
is asymptotically equivalent to estimating / from the Gaussian white noise 
model: 


dyt = \/ f(t)dt + 


1 

2 y/T n g(t) 


dW t , 


dv o 
dLeb’ 


tel. 
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Also, recall the definition of u m given in (5) and read 4==4 as 
is asymptotically equivalent to A'V Then, we can outline the proof in the 
following way. 


Step 1: Pr 


(7 1/ - ,y o,0,y) 



( 7 0m-" 0 ,0,U m ) ' 
r T n > 


• Step 2: P}7 m (8 )j =2 ^{T n is(Jj)) (Poisson approximation). 

Here (^)"h 2 ^ > ^'n v {Jj)) represents a statistical model associated with 
the observation of m— 1 independent Poisson r.v. of parameters T n u(Jj ); 


• Step 3: (3>”L 2 4^4 <S)J =2 ^ (2 ^T n u(Jj), 1) (Gaussian ap¬ 

proximation) ; 


• Step 4: (g)Jl 2 (2 v /T n i/(Jj), 1) 4^4 (j/ t )tei- 


Lemmas 4.1-4.3, below, are the key ingredients of Step 2. 

Lemma 4.1. Let and ££ m be the statistical models defined in (14) and 
(11), respectively. Under the Assumption (H2) we have: 


A= 0, for all m. 


Proof. Denote by N = N U {00} and consider the statistics S : (D, Q)t u ) —>■ 
P(N m_1 )) defined by 


S{x) = (a?; 2 , ■ • •, N Tf m ) with N T n 3 = Y I l( A ^)- ( 15 ) 

' r<T n 

An application of Theorem Appendix A. 12 to P^? ™ °’ 0 ’ i/ra ^ and Pj^°’ l '°\ 
yields 

j ~p(Y rn / m / / j \ \ p \ 

<*>-«p(E ( hl Ufij~])) N ^~ Tn Jfirn{y)-i)Mdy)f 

Hence, by means of the Fisher factorization theorem, we conclude that S is 
a sufficient statistics for PAff. Furthermore, under P|e ^ ie ranf ] om 

variables have Poisson distributions Qj with means T n v(Jj). Then, by 
means of Property Appendix A.7, we get A (7^°, A? m ) = 0, for all m. □ 

Let us denote by Qj the law of a Poisson random variable with mean 
T n Jj fm{y)n 0 (dy) and let be the statistical model associated with the 
family of probabilities {C^"l 2 Qj '■ f e 
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Lemma 4.2. 


A (Jf m , -Kt) < SUp . / — 


(f(y) - fm(y)) 2 M d v)- 


feJ? y K J /\[0,£rn] 

Proof. By means of Facts Appendix A.2-Appendix A.4, we get: 


/ m m \ 

A(Jf m , %m ) < SUp H[ 


< sup 


\ 


3 =2 


= sup 


V2 


\ 


Y ( 1 - exp 

3=2 ' 


T 

1 n. 


f(y)Mdy) 


f(y)M d y ) 


By making use of the fact that 1 — e x < x for all x > 0 and the equality 
yfa. — Vb = jY/b com bi n ed with the lower bound / > k (that also implies 

f m > k) and finally the Cauchy-Schwarz inequality, we obtain: 


T 

i t ±n 

1 - <>x l> ( “ g 


n 2 


f(y)M d y)-\ f(y)Mdy) 


T n 

< — 
~ 2 


n 2 


< 


T n V : 


f(y)Mdy)-\ f(y)Mdy) 


fjAfiv) ~ fm.(y))Mdy) 


K10 0 (Jj 


T 


< ^ / (/(?/) - /m(j/)) Mdy)- 


Hence, 


FT 


Qb<g)Q f i )<\Hr 

3=2 3=2 


'l\[0,e„ 


(. f(y ) - f m (y)) 2 Mdy)- 


□ 


Lemma 4.3. Let z> m and u m the Levy measures defined as in (5) and (13), 
respectively. For every f G there exists a Markov kernel K such that 


18 



Proof. By construction, u m and u m coincide on [0, £ m ]. Let us denote by 
&™ s and the restriction on / \ [0, £ m ] of v m and u m respectively, then 

it is enough to prove: KPp n m ) 

observe that the kernel M: 


= Pf y "" L First of all, let us 


771 ,, 

M (x, A) = I J.(x) / V j (y)u 0 (dy), x e I\[ 0, e m J, A e ^(/ \ [0, e m ]) 

j=2 ^ 

is defined in such a way that MTAfff = Indeed, for all A e &(I\ [0, e m ]), 

771 n 771 /» / n \ 

MC(^) = E / M(x,A)v™(dx) = Y / / / v j(y)M d y)] p m( dx ) 


3 =2 J Jj 


3 =2 ' 7 J ' 


{j A v M<dy)) u i J j) = f A fm(y)Mdy ) = C s (^)- ( 16 ) 


J=2 


Observe that (y 1 '™ 8- *' 0 , q, p^ s ) and (y 1 '™ 8- *' 0 , 0, z>£f) are Levy triplets asso¬ 
ciated with compound Poisson processes since P^ s and z>^ s are finite Levy 
measures. The Markov kernel Jl interchanging the laws of the Levy pro¬ 
cesses is constructed explicitly in the case of compound Poisson processes. 
Indeed if X is the compound Poisson process having Levy measure P ? r ® s , then 
X t = Yi, where N t is a Poisson process of intensity t rn := 9™ S (I\ [0, £ m ]) 
and the Yj are i.i.d. random variables with probability law y-P^ s . Moreover, 
given a trajectory of X, both the trajectory (rit)te[o,T n ] of the Poisson process 
(fVt)te[o,T n ] and the realizations y t of Y tl i = 1,..., rix n are uniquely deter¬ 
mined. This allows us to construct riT n i.i.d. random variables as follows: 
For every realization & of Y{, we dehne the realization yi of Y t by throwing 
it according to the probability law M(|/j, •). Hence, thanks to (16), (Lj)j 
are i.i.d. random variables with probability law The desired Markov 

kernel K (defined on the Skorokhod space) is then given by: 


K : (X t ) te[0) T n ] 
Finally, observe that, since 




fm(y)Mdy) = 


N t 

i =i / te[o,T„] 


f(y)M d y) = 


fm(y)Mdy ), 


(Ab)t e [o,T„] is a compound Poisson process with Levy measure z>. 
Let us now state two lemmas needed to understand Step 4. 


res 

777 • 


□ 
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Lemma 4.4. Denote by W* the statistical model associated with the contin¬ 
uous observation of a trajectory from the Gaussian white noise: 


dy t = VW)dt + 


— ._ dW t , 

2 v /7; v /^t) 


t G I \ [ 0 , £ m . 


Then, according with the notation introduced in Section 2.1 and at the begin¬ 
ning of Section J h we have 

A< 2^sup (A m (f) + B m (f)). 


Proof. As a preliminary remark observe that Wjff is equivalent to the model 
that observes a trajectory from: 

dy t = \JJ(f)g(t)dt + dW t , t G / \ [0, £ m j. 

Let us denote by Yj the increments of the process ( y t ) over the intervals Jj, 
j = 2, ...,m, i.e. 

Yj ■■= Vv, - y Vj ^ ^/f(ji)Mdy), ppj 


and denote by .Aj n the statistical model associated with the distributions of 
these increments. As an intermediate result, we will prove that 

YKn) < 2a /%, sup B m (f ), for all m. (17) 


To that aim, remark that the experiment -yY m is equivalent to observing m— 1 
independent Gaussian random variables of means Sj V f (y) u o(dy), 

j = 2, ...,m and variances identically 1, name this last experiment jY*. 
Hence, using also Property Appendix A.l, Facts Appendix A.2 and Ap¬ 
pendix A.5 we get: 


< A(^Y m ,^Y#) < 


f 2 VTf 


§ V VMJj) Jjj 


Vf(y)Mdy) - 2y fTniiJj 


Since it is clear that 8{W# ,jY m ) = 0, in order to bound A{jY m ,W^) it 
is enough to bound 8(^£Y m , ). Using similar ideas as in [9] Section 8.2, we 

define a new stochastic process as: 


777. f 777 

y «' = E f J/ v Ay)Mdy) + -f=J2dMf)B j (t), te/\[ O.eJ, 

j =2 de m J-n j=2 
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where the ( Bj(t )) are independent centered Gaussian processes independent 
of (W t ) and with variances 

Var = j V j (y)u 0 (dy) - ( j Vj(y)u 0 (dy)\ . 

These processes can be constructed from a standard Brownian bridge {B(s), s G 
[0,1]}, independent of (Wt), via 

Bi(t) = B^j' Vi(y)u Q (dy)^. 

By construction, (Y*) is a Gaussian process with mean and variance given 
by, respectively: 

TTL pf m / p \ pt 

E K*] = E K'] / v i(y)M d y) = ^2( / \z7(jj)M d y)) / Vj(y)M d v ), 


3 =2 


3 =2 


( 


Var[y t *] = Var K] ( J Vj(y)M d y)j + M-WVMBjit)) 


3 =2 


2 ^ m 


ft ™ 


4T f 


n J £m j —2 


^2M J j) v j(y)M d y) 


AT, 


3 =2 

1 ^z/ 0 (d 1 /) = Z/o([£m ’ t]) 




47b 


One can compute in the same way the covariance of (E t *) finding that 
Cov(y;,y,*)= Vs <i, 

We can then deduce that 

V = f VJjv)Md») + f t € I\ [0,£„J, 

-'em -'em ^V-^n 

where (W t *) is a standard Brownian motion and 


\/7n 


(x := 


E 

J=2 


Vf(y)M d y) )Vj-(a:). 


Applying Fact Appendix A.6, we get that the total variation distance 
between the process (Yt)tei\[o,em] constructed from the random variables Yj, 
j — 2,... ,m and the Gaussian process (|/t)te/\[o,£ m ] is bounded by 



V f(y)) 2 "o( d y), 


which gives the term in A m (f). 


□ 
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Lemma 4.5. In accordance with the notation of Lemma 4-4> we have: 


A(jri ,w:°) = o( sup ,/r„ / (\/ 7 w - 1)V„(*) 

V/e^ w 


( 18 ) 


Proof. Clearly 8(Wff 0 ,W*) = 0. To show that 8(W#,Wff 0 ) —>■ 0, let us 
consider a Markov kernel K* from (7(7 \ [0,e m ]) to C(J) defined as follows: 
Introduce a Gaussian process, with mean equal to t and covari¬ 

ance 

j-e m i 

-I[o,s]n[o ,t]{z)dz. 


In particular, 


Cov(5T, S t m ) = 


Var(P t m ) = 


4T n g(s ) 

f* 1 




vis. 


Consider it as a process on the whole of / by defining Bf 1 = B’ff \/t > £ m . 
Let oj t be a trajectory in C(I\ [0, £ m ]), which again we constantly extend to a 
trajectory on the whole of /. Then, we define K* by sending the trajectory 
oj t to the trajectory uj t + B™. If we define W n as the law induced on C(I ) by 


dy t = h(t)dt + 


dW t 


2 a jT n g{t)' 


tel, h(t ) = 


1 f 6 [0, £ m ] 

\fffl) tei\[o,£ m ], 


then i^ # W{| A[0 , £m ] = W n , where is dehned as in (8). By means of Fact 
Appendix A.6 we deduce (18). □ 


Proof of Theorem 2.5. The proof of the theorem follows by combining the 
previous lemmas together: 

• Step 1: Let us denote by the statistical model associated with the 

family of probabilities (Pj7 G <#’). Thanks to Property 

Appendix A.l, Fact Appendix A.2 and Theorem Appendix A. 13 we 
have that 

AW, W) < W sup »(/,/„). 

V 2 / e jr 

• Step 2: On the one hand, thanks to Lemma 4.1, one has that the sta¬ 
tistical model associated with the family of probability (pjd . 

G J^ - ) is equivalent to 22? m . By means of Lemma 4.2 we can bound 

A(22? m , Jf m ). On the other hand it is easy to see that S (, 22? m ) = 0. 
Indeed, it is enough to consider the statistics 

S' : x ^ WAav), • • •, Ij m ( A ^r) 

' r<T n r<T n 
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since the law of the random variable Yl r <r (Ax r ) under pjf 
is Poisson of parameter T n jj f m (y)i/ 0 (dy) for all j = 2,... , m. Finally, 

Lemmas 4.1 and 4.3 allows us to conclude that ,^° m ) = 0. Col¬ 

lecting all the pieces together, we get 

A (^n,m^rn) < sup J— f (f(y) - f m (y)) 2 v 0 (dy). 


• Step 3: Applying Theorem Appendix A.9 and Fact Appendix A.3 we 
can pass from the Poisson approximation given by to a Gaussian 
one obtaining 


A(«£? m ,^) = C sup 


E 


< C. 


E 


2k 


= C^ 


(m — 1)2 k 


fe* \ T n u(Jj) ^ j-* T n u 0 (Jj ) 

Step 4: Finally, Lemmas 4.4 and 4.5 allow us to conclude that: 


A(^°, W?) = O ( jT n sup (A m (f) + B m (f) + C m ) 
\ }&& 


T n Hr, 


o ( \fr n sup 

V V J /\[0,e m ] 


(f(y) - fm(y)) 2 Mdy) + 


□ 


4.2. Proof of Theorem 2.6 

Again, before stating some technical lemmas, let us highlight the main 
ideas of the proof. We recall that the goal is to prove that estimating / = A) 
from the discrete observations (X t .)^ =0 of a Levy process without Gaussian 
component and having Levy measure v is asymptotically equivalent to esti¬ 
mating / from the Gaussian white noise model 


dyt = \/W)dt + 


1 

2y/T n g(t) 


dW t , 


dv o 
dLeb’ 


tel. 


Reading P?\ •<=>■ as is asymptotically equivalent to ^*2, we have: 

• Step 1. Clearly (Wjf =0 (X u - X u _f)f =1 . Moreover, (X ti - 

Xt^i <==>■ ( efYi ) where (ef) are i.i.d Bernoulli r.v. with parameter 
a = i m A n e~ lmAn , i m := / A[0£m] f(y)u 0 (dy) and (Yfji are i.i.d. r.v. 

independent of (e*)” =1 and of density -j- with respect to z/o| A[0e 
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• Step 2. (ejYj)j 4^7 M(n; (7 j)fLi), where M(n; (7 j)JLi) is a multino¬ 
mial distribution with ^ = 1 — a and 7* := ais( Jf) i — 2,..., m; 

• Step 3. Gaussian approximation: A4(n; (71 ,... 7 m )) 4^7 <S>JL 2 ^ (2 \jT n u(Jj), 1) 

• Step 4. (g)”i 2 (2 y/T n v{Jj), 1) 4^7 ( : y t ) te i • 

Lemma 4.6. Let z/j, i = 1,2, 6e Lev?/ measures such that ui <C v 2 and 
b\ — b 2 = Jj , <:L 2/(^1 — u 2 )(dy) < 00 . Then, for all 0 < t < 00 , we have: 


Q(b i,0,Mi) _ q(&2,0,^ 2 ) 


tv <J- H (u u u 2 ). 


Proof. For all given t. let K t be the Markov kernel defined as K t {u,A) := 
Ihi(a7), V A G e^(M), V u G D. Then we have: 


lie! 


Q\ 


I TV 


= \\KtPt 


(61,0,2^1) 


,(6i,0,i/i) 


^ r t 


KtP t 

,(62,0,1/2) I 
t 


(62,0,1/2) I 


I TV 


I TV 


< \l-H{ u u u 2 ), 

where we have used that Markov kernels reduce the total variation distance 
and Theorem Appendix A. 13. □ 

Lemma 4.7. Let (P,;)” =1 , (F,)” =1 and be samples of, respectively, Pois¬ 
son random variables random variables with common distribution and 

Bernoulli random variables of parameters Xie~ Xi , which are all independent. 
Let us denote by Q{Y i ,p i ) (resp. Q{y iM )) the law of Y 1 - , (resp., e ?: K iy ). 
Then: 


Q(Yi,Pi) ~ <g)Q W , 




< 2 , 


TV 


£4 




(19) 


i= 1 i= 1 

The proof of this Lemma can be found in [34], Section 2.1. 
Lemma 4.8. Let fff be the truncated function defined as follows: 


= 


1 ifxe [0,e m ] 
f(x) otherwise 


and let zA (resp. is™ s ) be the Levy measure having f(f (resp. f\i\[o,e m ]) as 
a density with respect to is 0 . Denote by the statistical model associ¬ 
ated with the family of probabilities ( ®” =1 : 777 £ an d by 
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£ff s ’ u ° the model associated with the family of probabilities f 0”=! Qu- t ]_ 1 °’°' Urn ^ 
e Jp). Then: 

duo ) 

A(«2£’‘^«2£*’ mb ) = 0. 

Proof. Let us start by proving that 5(J2f' u °, £} r f s ’ u °) = 0. For that, let us 
consider two independent Levy processes, A^ tr and A" 0 , of Levy triplets given 
by (y 1 '™ -1 ' 0 , 0, and (0, 0, z/ 0 |[o,e m ]), respectively. Then it is clear (using 

the Levy-Khintchine formula) that the random variable Xf — X® is a ran¬ 
domization of Xj T (since the law of Xf does not depend on u) having law 

m °’°^ m \ for all t > 0. Similarly, one can prove that <5(i2jj es ’ 1 ' 0 , = 

0. □ 


Proof of Theorem 2.6. As a preliminary remark, observe that the model 
is equivalent to the one that observes the increments of ((ay),pjd ° °v)^, 
that is, the model associated with the family of probabilities ( (££)" =1 



• Step 1: Facts Appendix A.2-Appendix A.3 and Lemma 4.6 allow us 
to write 


n 



q{ Y~^fl,u) 


2=1 



2=1 




< 


TV 





(Vf(y) - l ) 2 M d v)- 


Using this bound together with Lemma 4.8 and the notation therein, we 
get A(J2£°, £? v f SM0 ) < Jn sup /e ^ H (/, ). Observe that v™ is a 

finite Levy measure, hence ^(ay),p£l m ^ is a compound Poisson 
process with intensity equal to i m := ^ , j f(y)vo(dy) and jumps 

size density f or a ll x e / \ [0, e m ] (recall that we are assuming 

that zz 0 has a density g with respect to Lebesgue). In particular, this 

means that Q)f ’ ’ m ; can be seen as the law of the random variable 
Yj where P ?: is a Poisson variable of mean i m A n , independent from 
(Yi)i> o, a sequence of i.i.d. random variables with density y^I/\[o,e rrt ] 
with respect to Lebesgue. Remark also that i m is confined between 
KVo(l \ [o, £m}) and Mu 0 (l \ [0, £ m ]). 
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Let (ej)j>o be a sequence of i.i.d. Bernoulli variables, independent of 
(Yj)j> 0 , with mean t m A n e~ tmA ". For i = 1 ,... ,n, denote by the 
law of the variable e t Y t and by J2 f n the statistical model associated with 
the observations of the vector (eA'i,..., e n Y n ), i.e. 



Furthermore, denote by Q{ the law of Y r Then an application 
of Lemma 4.7 yields: 


(g )Q{-®Q € i J 
1=1 1=1 


< 2i m y/nAl < 2Mu 0 (l\ [0, e m ]) y/nAJy 


Hence, we get: 

= oL„(I\[ 0 ,e m ])VS^Y ( 20 ) 

Here the O depends only on M. 


• Step 2: Let us introduce the following random variables: 

n n 

j= 1 i =1 

Observe that the law of the vector (Z i,..., Z m ) is multinomial A4 (n; 71,, j m ) 
where 


71 = l-L m A n e ^ A ", 7* 


A„e tmAn z/(Jj), i — 2,... ,m. 


Let us denote by M. n the statistical model associated with the observa¬ 
tion of (Z 1 ,..., Z m ). Clearly Ai n ) = 0. Indeed, AA n is the image 

experiment by the random variable S :/”—)■ { 1 ,..., n} m defined as 

S(xi,...,x n ) = (#{j : Xj = 0};#{j : x 3 e : Xj G J m }), 

where denotes the cardinal of the set A. 

We shall now prove that 6(A4 n ,<^^) < sup ^JnA n H' 2 (f, f m ). We 
start by defining a discrete random variable X* concentrated at the 
points 0, x*, i = 2 ,..., m: 


P(A* = y) 


li if V = x*, i = 1 ,... ,m, 
0 otherwise, 
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with the convention x\ = 0. It is easy to see that A4 n is equivalent to 
the statistical model associated with n independent copies of X*. Let 
us introduce the Markov kernel 


K(x*,A) 


I a (0 ) if i = 1, 

f A Vi(x)uo(dx) otherwise. 


Denote by P* the law of the random variable X* and by Q\' the 
law of a random variable e t Y t where e t is Bernoulli independent of Y % , 
with mean L m A n e~‘" mAn and V has a density ^-1 r\[o, £m ] with respect to 
Lebesgue. The same computations as in Lemma 4.3 prove that KP* = 
Q\’ . Hence, thanks to Remark Appendix A.8, we get the equivalence 
between M. n and the statistical model associated with the observations 
of n independent copies of e^Yi. In order to bound <5(A4 n , it is 
enough to bound the total variation distance between the probabilities 
(S>r=i Qi an( l Qi- Alternatively, we can bound the Hcllinger 

distance between each of the and Ql’, thanks to Facts Appendix 
A.2 and Appendix A.3, which is: 


n n 


Qi 


Q-’ 


i= 1 


i=l 


< 


TV 






1 - 7i 




It follows that 


S(M n , £!„) < y/nA n sup H(f, f m ). 


• Step 3: Let us denote by J\f* n the statistical model associated with 
the observation of m independent Gaussian variables JV (ny*, ny*), i = 
1 ,,m. Very similar computations to those in [9] yield 


A{M n ,N*m) 



m In m \ 

~Vn~ /' 


In order to prove the asymptotic equivalence between M. n and Af m de- 
hned as in (12) we need to introduce some auxiliary statistical models. 
Let us denote by A m the experiment obtained from A/" r * by disregarding 
the first component and by V m the statistical model associated with the 
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multivariate normal distribution with the same means and covariances 
as a multinomial distribution A4(n, Ji, ■.. ,j m ). Furthermore, let us 
denote by J\f# the experiment associated with the observation of m — 1 
independent Gaussian variables ,/F( y^rr/i , |), % = 2 ,... ,m. Clearly 
A(V m , A m ) = 0 for all m: In one direction one only has to consider 
the projection disregarding the first component; in the other direction, 
it is enough to remark that V m is the image experiment of A rn by the 
random variable S : (x 2 , ■ ■ ■, x m ) —> (n( 1 — z " i = 2 l ),x 2 , ■ ■ ■, x m ). More¬ 
over, using two results contained in [9], see Sections 7.1 and 7.2, one 
has that 

A(A m ,AC) = o(yF^ A (An,U*)=o{^j. 

Finally, using Facts Appendix A .2 and Appendix A.5 we can write 


A < 




\ 

< < sJAAmAAi\ [0,<rj)) 3 . 


To sum up, A (M n ,N m ) - + ^uA 2 (uo(/ \ [u : ,,,] )) ’j. with 

the O depending only on k and M. 

• Step 4: An application of Lemmas 4.4 and 4.5 yield 

A {Mm, C°) < 2 Vr n sup (A m (f) + B m (f) + C m (f )). 


□ 


5. Proofs of the examples 

The purpose of this section is to give detailed proofs of Examples 2.4 and 
Examples 3.1-3.3. As in Section 4 we suppose / C M + . We start by giving 
some bounds for the quantities A m (f), B m (f ) and L 2 (f, f m ), the L 2 -distance 
between the restriction of / and f m on / \ [ 0 ,£ m ]. 

5.1. Bounds for A m (f), B m (f), L 2 (f,f m ) when f m is piecewise linear. 

In this section we suppose / to be in ^^kkM) dehned as in (3). We are 
going to assume that the Vj are given by triangular/trapezoidal functions as 
in ( 6 ). In particular, in this case f m is piecewise linear. 



Lemma 5.1. Let 0 < k < M be two constants and let fi, i — 1,2 be functions 
defined on an interval J and such that k < fi < M, i — 1,2. Then, for any 
measure u 0 , we have: 


AM 


{fi{x) - f 2 (x)) 2 u 0 (dx) < J (\/ h(x) - \/Mx)) 2 iyo(dx) 

~ Ik f ~ f 2 ( x ^ 2,/0 ( dx ^ 


Proof. This simply comes from the following inequalities: 

^wf Mx) “ Mx)) ~ Jmlsm = ^ 


< 


2 \[k 


(fl( x ) - f 2 (x)). 


□ 

Recall that x* is chosen so that Jj(x — x*)u 0 (dx) = 0. Consider the 
following Taylor expansions for x G Jy 

f(x) = f(x*) + f'(x*)(x - x*) + Ri(x); f m (x) = f m (x*) + fl(x*)(x - x*), 

where f m (x*) = and f m {x*) is the left or right derivative in x* depending 

whether x < x* or x > x* (as f m is piecewise linear, no rest is involved in its 
Taylor expansion). 

Lemma 5.2. The following estimates hold: 


l-Ri(®)l < K \& - x *V\ x - x * I; 

| f( x *i) - fm( x *i) I < H-RilUooM .for 1 = 2 ,..., rn - 1; 

\m)-m^T h l n}+Klx '~^ lx ~ x '- 


if x E Ji, i — 3,..,, m — 1 
if x G J t , i G {2, m}. 


for some constant C and points fi G Ji, r)i G J*_i U J* U J l+ \, r 2 G J 2 U J 3 
and T m G Jm— 1 U J m . 


Proof. By definition of R,, we have 


Ri( x )\= (f(^-f( x i))( x ~ x i) <K\fi- x *V\x-x 
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for some point & G ■J l . For the second inequality, 


I f(x*) - fm(x*) I = 


^0 ( Ji ) 

1 


(f( x *) - f{x))v 0 (dx) 


v 0 (Ji) 


R i(x)u 0 (dx) 


'Ji 


— miLM, 


where in the hrst inequality we have used the defining property of x*. For the 
third inequality, let us start by proving that for all 2 < i < m — 1, f' n (x*) = 
f'(Xi ) f° r some Xi £ Ji U J t+ \ (here, we are considering right derivatives; for 
left ones, this would be Jj_i U Jj). To see that, take x G Ji fl [x*,x* +1 ] and 
introduce the function h(x) := f(x) — l{x) where 


l(x) = 


X — Xi 


x i+1 


Xi 


(. fm( x i+l ) - + / m (x*). 


Then, using the fact that fj (x—x*)u 0 (dx) = 0 joint with fj (x—x* +1 )u 0 (dx) = 
(a£+i - x*)n m , we get 



h(x)u 0 (dx) = 0 



h(x)u 0 (dx). 


In particular, by means of the mean theorem, one can conclude that there 
exist two points p t G Ji and pi+\ G •/,;+1 such that 


KPi) 


fj h(x)u 0 (dx) 

Vo (Ji) 


f Ji+1 Hx)vo(dx) 

v 0 (Ji+i) 


h(Pi+1 )- 


As a consequence, we can deduce that there exists Xi £ \puPi+ i] T U J, :+1 
such that h'(xj) = 0, hence f'(xi ) = £ 7 (Xi) = When 2 < i < m — 1, 

the two Taylor expansions joint with the fact that f m {x*) = f'(Xi) f° r some 
Xi G Ji U J i+ 1 , give 


1/0*0 - fm(x) | < |/«) - fm(x*i) | + |i2i(x)| + A>* - Xi|‘> - X*| 

< 2 II^IUoo(^o) + K \ x * ~ XiV\ x - x i I 

whenever x £ Ji and x > x* (the case x < x* is handled similarly using the 
left derivative of f m and G J*_i U Jj). For the remaining cases, consider for 
example i — 2. Then f m {x) is bounded by the minimum and the maximum 
of / on J 2 U J 3 , hence f m {x) = /(r) for some r G J 2 U J 3 . Since /' is bounded 
by C = 2 M + K, one has |/(x) — / m (x)| < C\x — r |. □ 
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Lemma 5.3. With the same notations as in Lemma 5.2, the estimates for 
B m(f) and L 2 (fJ m ) 2 are as follows: 

i / rn n 2 

ivif, fmf < / ( 2 II^IUooM + K \ x i - ,f iiV\ x - <l) M dx ) 

V i =3 JJi 

+ / \ x — T 2 \ 2 Mdx) + 

AlU) = = o(i 2 (/,/,„) 2 ) 

/ m 1 

B 2 m (J) = OHT - 7 =i'o(^ i )(2v / M + l) 2 ||ft|lL(„) 

\ \/ rC 

X 2=2 v 

Proof. The Z^-bound is now a straightforward application of Lemmas 5.1 
and 5.2. The one on A m (f ) follows, since if / G LrkM) then yff G 

K . In order to bound B? n (f) write it as: 



3=1 


fjj VJMMdy) I u{jj 




MJj 


=: E 

3=1 


By the triangular inequality, let us bound Ej by F-. + Gj where: 


Fj = 


HJj) 

MJj 


f(x* 


and Gj = 


fM) 


fj , VJM)M d y) 


MJj 


Using the same trick as in the proof of Lemma 5.1, we can bound: 

fj, ( f ( x ) - f ( x *)) Mdx) 


Fj < 2 y/M 


MJj 


< 2 VM\\Rj\\ Loo („ 0 ). 


On the other hand, 
1 


Gj = (T 

MJj 


MJj 


^ i\/fMj) - MJWjMdy) 

f f f'M) 

L m^Jm) 


{x-x)) + Rj{y ) )Mdy) 




which has the same magnitude as ^ ||-Ry ||z. oo (^ 0 )- 


□ 
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Remark 5.4. Observe that when vq is finite, there is no need for a special 
definition of f m near 0, and all the estimates in Lemma 5.2 hold true replacing 
every occurrence of i — 2 by i — 1. 

Remark 5.5. The same computations as in Lemmas 5.2 and 5.3 can be 
adapted to the general case where the V)’s (and hence f m ) are not piece- 
wise linear. In the general case, the Taylor expansion of f m in x* involves a 
rest as well, say Ri, and one needs to bound this, as well. 


5.2. Proofs of Examples 2.4 

In the following, we collect the details of the proofs of Examples 2.4. 

1. The finite case: u 0 = Leb([0,l]). 

Remark that in the case where vq if finite there are no convergence prob¬ 
lems near zero and so we can consider the easier approximation of /: 


rnO\ if x £ [0, x*], 

fm(x) := m 2 [9 j+ i(x - x*) + 9j(x* j+1 - x)] if x G (x*,x* +1 ] j = 1, ... ,m - 1, 

m9 m if x G {x* m , 1] 


where 


„* _ 2j - 1 T _ fj - 1 j 

X-i „ s Jj ( , 

V m m 


, 9j = / f(x)dx, j = 1,... ,m. 


3 2m 

In this case we take e m = 0 and Conditions (C 2) and (C 2') coincide: 
lim nA n sup (A 2 m (f) + B 2 n (f)) = 0. 

l->00 f (z \ / 


n—yoo 

Applying Lemma 5.3, we get 


sup (L 2 (f,f m ) + A m (f) + B m (f))=0(m *+m 1 7 ); 
fe& v J 


(actually, each of the three terms on the left hand side has the same rate of 
convergence). 

2. The finite variation case: &{*) = x 1I[ [o,i]( x )- 
To prove that the standard choice of V 3 described at the beginning of 
r 3 (lx 

Examples 2.4 leads to / V)(x)— = 1, it is enough to prove that this integral 


x 


/ l m 

-m j = 2 


is independent of j, since in general 

observe that, for j = 3,..., m — 1, 

b* x - x*_ x dx 


dx mi 

fix)— = m — 1. To that aim 

x 


Em / Vj(x)u 0 (dx) = 


'em 


3-1 3 


X* — X 


j -1 


X 


+ 


b +1 x j+1 


X dx 


rp 1 *' _ rp 1 *' rp 

X j +1 X 
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Let us show that the first addendum does not depend on j. We have 



1 and 


X U r* 5 dx xU ln /^-iy 

rp* rp* / rp rp ^ rp ^ V rp* ) 

■' j *3-1 ■ •' •'./ ■'./ i v ■' .i 7 


Since x* = — 1 and Vj = £m 1 , the quantities and, hence, * 3 / do 

J J _ i Xj _ 

not depend on j. The second addendum and the trapezoidal functions V 2 
and V m are handled similarly. Thus, f m can be chosen of the form 


fm(x) ■■= { 


1 

v {Jl) 

f-f-m 

1 


v(Jm) 




f-^m 


h j +1 


' f-Im 


lfxe [0, £ m ] , 

if X e (e m ,x* 2 ], 

if x G (x*,x* j+1 ] j = 2, 

if X e (x* m , 1]. 


A straightforward application of Lemmas 5.2 and 5.3 gives 



(f(x) - fm(x)^ Vo(dx)+A m (f) + B m (f) 



as announced. 

3. The infinite variation, non-compactly supported case: ^\(x) = 
^ -2 Ir + (A)- Recall that we want to prove that 


L 2 (fJ m ) 2 + A 2 m (f) + B 2 m (f) 


(H(m) 3+ 47 

V (£m r m) 2 't 


f(x) 2 

x>H(m) H(m) 


5 


for any given sequence H{m ) going to infinity as m —>■ oo. 

Let us start by addressing the problem that the triangular/trapezoidal 

A 

choice for Vj is not doable. Introduce the following notation: Vj — Vj + A,-, 

A 

j = 2,.. . ,m, where the Vj’s are triangular/trapezoidal function similar to 

A 

those in (6). The difference is that here, since x* m is not defined, Kn._i is 
a trapezoid, linear between x* n _ 2 and x* n _ l and constantly equal to — on 

A 

[ x *m-\i v m- 1 ] and V m is supported on [u m _i,oo), where it is constantly equal 
to —. Each Aj is chosen so that: 

1. It is supported on [x*_ ,, a;* +1 ] (unless j = 2, j — rn — 1 or j = m ; in the 
first case the support is in the second one it is [x’^ n _ 2 , x* m _^\, 

and A m = 0); 


• -,m- 1, 
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2. Aj coincides with — A,_i on [x*_i, a;*], j = 3,..., m — 1 (so that ^ Vj = 
—) and its first derivative is bounded (in absolute value) by — , / « —t 

(so that V.j is non-negative and bounded by —); 

v J l^n ' 

3. Aj vanishes, along with its first derivatives, on x *_ x , x* and x* +l . 


We claim that these conditions are sufficient to assure that f m converges to 
/ quickly enough. First of all, by Remark 5.5, we observe that, to have a 
good bound on L 2 (f, f m ), the crucial property of f m is that its first right 
(resp. left) derivative has to be equal to — r~A (resp. — , / , —A and 

its second derivative has to be small enough (for example, so that the rest 
Rj is as small as the rest Rj of / already appearing in Lemma 5.2). 

The (say) left derivatives in x* of f m are given by 


hyp = (p(y)+y(y))p(^)-y^-,)); hyp = .yyppt./p-y.y,)). 


Then, in order to bound |/^(x*)| it is enough to bound \A"Ax*)\ because: 


/"(**) < \A"j{x*j)\ 


dx 


x z 


'Jj -1 


nqr 

f(x )-i < \A , '(x*)\sup\f'(x)\( y £ j +£j^i)fi m , 

x x£l 


where ij is the Lebesgue measure of Jj. 

We are thus left to show that we can choose the Aj-’s satisfying points 
1-3, with a small enough second derivative, and such that J f Vj(x)- § = 1. To 
make computations easier, we will make the following explicit choice: 


Aj(x) = bj(x — x*) 2 (x — x *_i) 2 Vx G [x*_ 1 ,x*j), 

for some bj depending only on j and m (the definitions on [x *, x * + ,) are 
uniquely determined by the condition Aj + A,- +1 = 0 there). 

Define j max as the index such that H{m) G Jj max ; it is straightforward to 
check that 


3max. ~ m - £m ^ l ( ^ ; x* m _ k = e ro (m-l)log (l + y), k = 1, 
Hym) V kJ 

One may compute the following Taylor expansions: 


,m — 2. 


A 115 / 1 \ 

L - F + oy, 

m — fc — 1 

f x m-fc+l A 111 / 1 \ 

V m - t (x)u 0 (dx) = § S + °(p)- 

L m — k 


34 



In particular, for m 3> 0 and m — k < j max , so that also k 0, all the 

X* A 

integrals f x * +1 Vj(x)v 0 (dx) are bigger than 1 (it is immediate to see that the 

3 A 

same is true for V 2 , as well). From now on we will fix a k > and let 
j — m — k. 


Summing together the conditions f T Vi(x)u 0 (dx ) = 1 Vi > j and noticing 
that the function Y^iLj ^ constantly equal to -J- on [x*, 00 ) we have: 


1 


Aj(x)u 0 (dx ) —m — j + 1- v Q ([x*, 00 )) 


i—i 


= k + 1 


hm 

1 


X 3 A 

Vj{x)v Q (dx) 


i-i 


M 1 + |) 


6 k 


k 2 




4/c 


A; 2 


Our choice of A,- allows us to compute this integral explicitly: 
// 6 i(* - ^-i) 2 ( x - “ 1 )) 3 (^ + 

Jx j~i 

In particular one gets that asymptotically 


1 3 4 1 ^ ( k y 

3 (£m(m — l)) 3 2 4 k \£ m m) 


This immediately allows us to bound the first order derivative of Aj as asked 
in point 2: Indeed, it is bounded above by 2b 3 £ 3 _ 1 where i 3 ~\ is again the 
length of Jj_ 1 , namely £j = It follows that for m big enough: 


sup \A'j (x) | < — < — 

x£l k fi r 


.{x*-x*) \e 


k 




2 


The second order derivative of Aj(x) can be easily computed to be bounded 
by 4:bj£ 2 . Also remark that the conditions that |/| is bounded by M and that 
f is Holder, say | fix) — f'(y)\ < K\x — y | 7 , together give a uniform L^ 
bound of | f'\ by 2 M + K. Summing up, we obtain: 




k 3 £ r 




(here and in the following we use the symbol < to stress that we work up to 
constants and to higher order terms). The leading term of the rest Rj of the 
Taylor expansion of f m near x* is 



2 






£ m m 


k 7 
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Using Lemmas 5.2 and 5.3 (taking into consideration Remark 5.5) we 
obtain 


Jmax 


I f(x) ~ fm(x)\ 2 Vq(cLx) < ^2 / I f( X ) - fm(x)\ 2 U 0 (dx) + / | f(x) ~ f m (x)\ 2 U 0 (dx 


j =2 ' J -< 


< 

r^j 


yy / i?; 

L_ £mm 

,C— if(m) 


(£ m m ) 2+27 (e m m) 


£; 4 + 47 


+ 


/c 14 


H(m) 


+ 


sup /(x) 


Hfm) x>H(m) 

( 21 ) 


< 

r^j 


H(m ) 3+47 H(m) 13 \ 1 

3” / \ 1 n I ~L 


(£ TO m) 2+27 (£ m m) 10 y H(m)' 

It is easy to see that, since 0 < 7 < 1, as soon as the first term converges, it 
does so more slowly than the second one. Thus, an optimal choice for H{m) 
is given by ^e m m, that gives a rate of convergence: 

L 2 (fJmf < * • 

y/£m‘m 

This directly gives a bound on H(f, f m ). Also, the bound on the term A m (f), 

which is L 2 (v 7 , Vf m ) 2 , follows as well, since / G ^I u k,k,m) im pl ies Vf e 
K /s y M) • Finally, the term £> 2 ,(/) contributes with the same rates as 


those in (21): LIsing Lemma 5.3, 


I H (m) I 

B m(f) ^ X] ^(^)II^IIL + U)([# M> OO)) 

1=2 

< ^ /%m\ 2 +27 1 

~ dm 2_^ \ k 2 ) Him) 

b .— £ m (m — 1) V 7 

JL(m) 

H (m) :3+47 1 

~ (£ m m ) 2+27 Hfm) 

5.3. Proof of Example 3.1 

In this case, since £ m = 0, the proofs of Theorems 2.5 and 2.6 simplify 
and give better estimates near zero, namely: 

A (0*$r, O < Ci (yf n sup (An(f) + B m (f) + L 2 (f, /„)) + 7(0 

A(^7, < c 2 (+ v/5; sup (a»(/) + B m (f) + (/, /■»))), 

V V ' n v 2 / 

( 22 ) 
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where C\, C 2 depend only on k, M and 


A m(f) = J o {y/ljy) ~ VJ&)) dy, B m (f) = Y (y™ fj 3/Yy) d v ~ 

As a consequence we get: 

A (&>%&,, w:°) < o(yr n (m-i + . 

To get the bounds in the statement of Example 3.1 the optimal choices are 

1 2 

m n = Tn +1 when 7 < \ and m n = T „ 5 otherwise. Concerning the discrete 
model, we have: 

A ($$v, W n u °) < O ( . 

There are four possible scenarios: If 7 > \ and A n = n _/3 with \ < (3 < \ 
(resp. (3 > |) then the optimal choice is m n = n 1-/3 (resp. m n = n^). 

If 7 > | and A n = n _/3 with \ < j3 < (resp. (3 > |±|*) then the 

2-/3 

optimal choice is m n = n 4 + 2 t (resp. m n = n 1_/3 ). 


5.^. Proof of Example 3.2 

As in Examples 2.4, we let £ m = m^ 1_a and consider the standard trian¬ 
gular/trapezoidal Vjf’s. In particular, f m will be piecewise linear. Condition 
(C2’) is satisfied and we have C m (f) = 0(e m ). This bound, combined with 
the one obtained in (7), allows us to conclude that an upper bound for the 
rate of convergence of A( 32 v r ° FV , W”°) is given by: 

A(^ Fr ,K n ) <c( V / v^e ro +\/^(h^)V^^+ v ^A|ln(£- 1 )), 

\ \ TYl / a / Tl J 


where C is a constant only depending on the bound on A > 0. 

The sequences e m and m can be chosen arbitrarily to optimize the rate 
of convergence. It is clear from the expression above that, if we take e m = 
rrT x ~ a with a > 0, bigger values of a reduce the first term a/ \/n 2 Afe m , 
while changing the other terms only by constants. It can be seen that taking 
a > 15 is enough to make the first term negligeable with respect to the others. 
In that case, and under the assumption A n = n -/3 , the optimal choice for m 
is m = n s with A = ^-p. In that case, the global rate of convergence is 


A(J2 


v 0 

n,FVi 



O (n 2 /3 In n) if \ < /3 < p 
Inn) if p < (3 < 1. 
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In the same way one can find 


= o(y^)V ^ )+ ^- )+ V^s m ). 

As above, we can freely choose £ rn and m (in a possibly different way from 
above). Again, as soon as £ m = m~ 1 ~ a with a > 1 the third term plays no 
role, so that we can choose e m = m~ 2 . Letting A n = n _/3 , 0 < (3 < 1, and 
m = n s , an optimal choice is S = 1 . giving 

^Zfv^u) = 0(^(lnn) § ) =0(l^®(lnT B )*). 


5.5. Proof of Example 3.3 

Using the computations in (21), combined with (f(y)-fm(y)) 2 < 4exp(— 2X 0 y 3 ) < 
4exp(— 2X 0 H(m) 3 ) for all y > H(m), we obtain: 



f(x) - f m (x) vo(dx) 


< 


< 

r^j 


H(m) 7 

(£ m m ) 4 

H(m) 7 
{£ m m) 4 



f(x) - fm(x) > 0 (dx) 


g -2A 0 H(m) 3 

H(m) 


As in Example 2.4, this bounds directly H 2 (f,f m ) and A 2 n (f). Again, the 
first part of the integral appearing in B 2 n (f) is asymptotically smaller than 
the one appearing above: 


D: 


,(/) = £ 

3 =1 


n/7^0 - 



f(x)v 0 (dx] 


< 

r^j 


£mm 

H (m) 

£ 

fc= 


H(mf 
(£m'm ) 4 ' ^ V \/7bL .7 j, 


(- 


y/7^0 


f(x)u 0 (dx] 


Jm — k 


< H(m) 7 e - XoH{m)3 

~ ( £ m m ) 4 + 


As above, for the last inequality we have bounded / in each k < ffAfk, 

with exp(—AoLf (m) 3 ). Thus the global rate of convergence of L 2 (f, fm) 2 + 


4t(/) + B m(f) i = S? + 


m) 
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Concerning C m (f), we have C^(f) = (1} dx < £ b m . To write 

the global rate of convergence of the Le Cam distance in the discrete setting 
we make the choice H(m ) = ■?/j- In 777, for some constant 77 , and obtain: 


A = O 


( \/nA n + mln m + — ((In m 


\ £r 


n 


7 — R 

6 m 2 
+ 


{e m m) 2 \/\nrn 


+ ^n 2 A n ef n 


Letting A n = n P, e m — n a and m = n 5 , optimal choices give a = f and 
6 = 7 ; + We can also take 77 = 2 to get a final rate of convergence: 


A(jc,^r°) = 

In the continuous setting, we have 


J 0 (n2 3^) ;f 3 ^ ^ 12 


1 4 ^ ^ ^ 13 
0(n~^ + ^(\nn)^) if j§ < ft < 1. 


A(^ 0 ,^ 0 ) = ofy^A 


V 


7 _v 

6 777 2 

H- „ , + £m ) + 


(In 777 
(e m 777 ) 2 ' ^ln 777 


I e m m' 
77 A„ 


Using T n = 77 A n , e m = T n a and 777 = , optimal choices are given by 

a = t», 5 = A; choosing any 77 > 3 we get the rate of convergence 


A(^°,C°) = 0(T n 34 (lnT n )s). 


Appendix A. Background 

Appendix A.l. Le Cam theory of statistical experiments 

A statistical model or experiment is a triplet = (&j, { Pj,o ; 0 G 0}) 

where {-Pj,e; 6 G 0} is a family of probability distributions all defined on 
the same er-field srfj over the sample space and 0 is the parameter 
space. The deficiency 5(L? > 1 , SPfi) of with respect to quantifies “how 
much information we lose” by using &i instead of and it is defined as 
S (, PPf) = infxsup 0g0 | \KPifi — P- 2 ,e\ \tv, where TV stands for “total vari¬ 
ation” and the infimum is taken over all “transitions” K (see [32], page 18). 
The general definition of transition is quite involved but, for our purposes, it 
is enough to know that Markov kernels are special cases of transitions. By 
KP\ o we mean the image measure of Pi g via the Markov kernel K, that is 

KP he (A)= [ K(x,A)P lie (dx), VA G ^ 2 . 

J x 1 
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The experiment Kg?\ = (^ 2 , ^ 2 , {KPi,o\9 G 0}) is called a randomization 
of by the Markov kernel K. When the kernel K is deterministic, that is 
K(x,A ) = IaS(x) for some random variable S : ( 3C \, ) —y (^ 2 ,^ 2 ), the 

experiment Kg ?1 is called the image experiment by the random variable S. 
The Le Cam distance is defined as the symmetrization of <5 and it defines 
a pseudometric. When A(^i, g?f) = 0 the two statistical models are said 
to be equivalent. Two sequences of statistical models (^f) n eN an d {g ? 2 )neN 
are called asymptotically equivalent if A^g?f, g?f) tends to zero as n goes to 
infinity. A very interesting feature of the Le Cam distance is that it can be 
also translated in terms of statistical decision theory. Let be any (measur¬ 
able) decision space and let L : 0 x g? 1-4 [0, 00 ) denote a loss function. Let 
||L|| = sup ^ iZ ) e0X 0 L(9, z). Let 7 q denote a (randomized) decision procedure 
in the i-th experiment. Denote by i?j(7q, L , 6) the risk from using procedure 
Tij when L is the loss function and 9 is the true value of the parameter. Then, 
an equivalent definition of the deficiency is: 

S(g?i, g? 2 ) — inf sup sup sup \Ri(ni, L,9) — R, 2 (iT 2 i L,9)\. 

71-1 7T 2 0e0L:||L||=l 


Thus g? 2 ) < £ means that for every procedure 7 r.; in problem i there is 

a procedure 7 Tj in problem j, {i,j} = {1,2}, with risks differing by at most 
£, uniformly over all bounded L and 6 G 0. In particular, when minimax 
rates of convergence in a nonparametric estimation problem are obtained 
in one experiment, the same rates automatically hold in any asymptotically 
equivalent experiment. There is more: When explicit transformations from 
one experiment to another are obtained, statistical procedures can be carried 
over from one experiment to the other one. 

There are various techniques to bound the Le Cam distance. We report 
below only the properties that are useful for our purposes. For the proofs 
see, e.g., [32, 47]. 

Property Appendix A.l. Let g?j = (^T, srf, {Pj,e] 9 e 0}), j = 1,2, be 
two statistical models having the same sample space and define A 0 (^i, g? 2 ) : = 
sup 0ee \\P 1>0 - P 2 p \\tv- Then, A(g g 1 , g* 2 ) < A 0 (^i, 

In particular, Property Appendix A.l allows us to bound the Le Cam 
distance between statistical models sharing the same sample space by means 
of classical bounds for the total variation distance. To that aim, we collect 
below some useful results. 

Fact Appendix A. 2. Let P\ and P 2 be two probability measures on , 
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dominated by a common measure f, with densities g t = i — 1,2. Define 


L l (P 1 ,P 2 ) = \ \gfix) - g 2 (x)\f(dx), 

J % 

/ I* 2 \ 1 /^ 

H(P u P 2 ) = ( I {y/gM- v^M) t( dx )J ■ 

Then, 

\\Pi-P2\\tv=\l 1 (P 1 ,P 2 )<H(P 1 ,P 2 ). (A.l) 

Fact Appendix A.3. Let P and Q be two product measures defined on the 
same sample space: P = ®™ =1 Pi, Q = <g)™ =1 Qi. Then 


n 

H 2 (P,Q)<J2W(P i ,Q,). (A.2) 

i=1 

Fact Appendix A.4. Let P it i = 1,2, be the law of a Poisson random 
variable with mean A*. Then 

H 2 (Pi, P 2 ) = 1 - exp ( - / (v 7 ^ - V%) 2 ) • 

Fact Appendix A. 5. Let Q i ~ JY (/xi, af) and Q 2 ~ -A (g 2 , erf). T/ien 


||<5i — Q 2 IITV < 



(^1 ~ Z ^) 2 
2crf 


Fact Appendix A.6. Fori = 1,2, ZeZ Qi, i = 1,2, fre Z/je law on (C,^) of 
two Gaussian processes of the form 

hi(s)ds + f a(s)dW s , t G [0, T\ 

J 0 

where hi G L 2 (M) and a G M >0 . Then: 



L\ (Qi, Q 2 ) A 



(fei(y) - M?/)) 2 ^ 


Property Appendix A.7. LeZ A* 2 ) = {P^, 0 g 0}), i = 1,2, fre 

Zwo statistical models. Let S : SE\ —> ST 2 be a sufficient statistics such that 
the distribution of S under P\.o is equal to P 2 ^. Then A(A^i, & 2 ) = 0. 
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Remark Appendix A.8. Let P t be a probability measure on (E l , £ t ) and K, 
a Markov kernel on (G l} Q,). One can then define a Markov kernel K on 
(n hEi, <E)” = i Qi) in the following way: 

n 

K(x i,..., x n ; A 1 x • • • x A n ) := Ki(x h Af), VA G E u VA* G 

i=1 

Clearly K Pi = (2)f =1 K t P t . 

Finally, we recall the following result that allows us to bound the Le Cam 
distance between Poisson and Gaussian variables. 

Theorem Appendix A.9. (See [4], Theorem 4) Let P\ be the law of a 
Poisson random variable X\ with mean A. Furthermore, let P A be the law of 
a random variable Z\ with Gaussian distribution JV (2V% 1), and let U be a 
uniform variable on [— |) independent of X\. Define 

Z x = 2sgn(A A + U) yJ\X x + U\. (A.3) 

Then, denoting by P\ the law of Z\, 

H 2 {Px,P" x )=0( A" 1 ). 

Remark Appendix A. 10. Thanks to Theorem Appendix A.9, denoting by 
A a subset of M>o, by RZ (resp. LZ*) the statistical model associated with 
the family of probabilities {P A : A G A} (resp. {P A : A G A}), we have 

<sup^, 

AeA A 

for some constant C. Indeed, the correspondence associating Z\ to A" a defines 
a Markov kernel; conversely, associating to Z\ the closest integer to its square, 
defines a Markov kernel going in the other direction. 

Appendix A.2. Levy processes 

Definition Appendix A. 11. A stochastic process {X f : t > 0} on R de¬ 
fined on a probability space (12, sZ, P) is called a Levy process if the following 
conditions are satisfied. 

1. X 0 = 0 P-a.s. 

2. For any choice of n > 1 and 0 < t 0 < t\ < ... < t n , random variables 
X to , X tl - X to ,..., X tn - X tn l axe independent. 
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3. The distribution of X s+t — X s does not depend on s. 

4. There is Oo € £/ with P(h2o) = 1 such that, for every c o G Oo, X t (ui) is 
right-continuous in t > 0 and has left limits in t > 0 . 

5. It is stochastically continuous. 

Thanks to the Levy-Khintchine formula, the characteristic function of 
any Levy process {Xt} can be expressed, for all u in M, as: 

E[e*“ Xt ] = exp ^ - t(iub - - J (1 - e my + iuyl^^ )z/(ch/)^, 

where b, a G I and v is a measure on R. satisfying 

z/({0}) = 0 and / ( \y \ 2 A l)u(dy) < oo. 

Jr 

In the sequel we shall refer to ( b , a 2 , v) as the characteristic triplet of the 
process {A" f } and v will be called the Levy measure. This data characterizes 
uniquely the law of the process {X t }. 

Let D = -D([ 0 , oo), M) be the space of mappings oj from [0, oo) into M that 
are right-continuous with left limits. Define the canonical process x : D —>■ D 
by 

Vta G D , x t (u ) = u t , Vt > 0. 

Let 3> t and S> be the u-algebras generated by {a: s : 0 < s < t} and 
{a: s : 0 < s < oo}, respectively (here, we use the same notations as in [44]). 

By the condition (4) above, any Levy process on M induces a probability 
measure P on Thus {A" t } on the probability space ( D,S>,P ) is 

identical in law with the original Levy process. By saying that ({x t },P) 
is a Levy process, we mean that {x t : t > 0} is a Levy process under the 
probability measure P on For all t > 0 we will denote Pt for the 

restriction of P to In the case where Jj , <:L \y\v(dy ) < oo, we set 7 v := 
r , <1 yu(dy). Note that, if v is a finite Levy measure, then the process having 
characteristic triplet ( 7 ", 0, u) is a compound Poisson process. 

Here and in the sequel we will denote by Ax r the jump of process {ay} 
at the time r: 

Aay. = x r — lim x s . 

s^r 

For the proof of Theorems 2.5, 2.6 we also need some results on the equiv¬ 
alence of measures for Levy processes. By the notation <C we will mean “is 
absolutely continuous with respect to”. 
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Theorem Appendix A.12 (See [44], Theorems 33.1-33.2 and [45] Corol¬ 
lary 3.18, Remark 3.19). Let P 1 (resp. P 2 ) be the law induced on {D,S>) by 
a Levy process of characteristic triplet (rj, 0, V\ ) (resp. (0, 0, u 2 )), where 

V — y{v\~v2){dy) (A.4) 

J\v\<l 

is supposed to be finite. Then Pf <C Pf for all t > 0 if and only if u\ <C v 2 
and the density p- satisfies 

u dis 2 J 

J (/> “ X ) ,y ' 2 ( fiy ) < °o- ( A -5) 

Remark that the finiteness in (A.5) implies that in (A.4). When Pf <C P 2 , 
the density is 

dp 1 

jjkix) = e MUt(x)), 

with 

U,(x) = lim ( V In A (Ax,.) I . . >L - j t(fLf(y)-djv 2 (dy)\p ( ° M -n.s. 

^ (A. 6 ) 

The convergence in (A. 6 ) is uniform in t on any bounded interval, p(°’°’ i ' 2 )- 
a.s. Besides, {U t (x)} defined by (A. 6 ) is a Levy process satisfying E P ( 0 ,o^ 2 ) [e Ut ^] 
1 , Vt > 0 . 

Finally, let us consider the following result giving an explicit bound for the 
Li and the Hcllinger distances between two Levy processes of characteristic 
triplets of the form {pi, 0 , uf), i — 1,2 with b\ — b- 2 = y{v 1 — v 2 ){dy). 

Theorem Appendix A.13 (See [30]). For any 0 < T < 00 , let Pf be the 
probability measure induced on (. D , S> T ) by a Levy process of characteristic 
triplet ( bi , 0 , vf), i — 1 , 2 and suppose that v\ <C u 2 . 

If H 2 (v!,v 2 ) := J (fififif)- 1 ) 2 v 2 (dy) < 00 , then 
H 2 (Pf,Pp < (H 2 Uh,n 2 ). 

We conclude the Appendix with a technical statement about the Le Cam 
distance for finite variation models. 

Lemma Appendix A.14. 

A(0>?,0’£ FV ) = 0 . 
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Proof. Consider the Markov kernels 7Ti, n 2 defined as follows 


, K\{x,A)=lA{x d ), 7r 2 (o;, A) = IU(a; — •y 1 ' 0 ), Vx^D,A^S>, 

where we have denoted by x d the discontinuous part of the trajectory x, i.e. 
Ax r = x r — lim s | r x s , x d = J2 r<t Ax r and by x — •y 1 ' 0 the trajectory x t — t'yv 0 , 
t e [0,T n ], On the one hand we have: 

T Tl p(r- v °[ 7n (x,A)pW'~ v °’°’ v \dx)= [ l A {x d )P^ 0 ^\dx) 

J D J D 

= pW'°rt{A), 

where in the last equality we have used the fact that, under P^ v ‘ / °>°’ !/ ) ) { x d } 
is a Levy process with characteristic triplet (y 1 ", 0, v) (see [44], Theorem 19.3). 
On the other hand: 

n 2 P {Y '°’ u )(A) = f t T 2 {x,A)P^ 0fi ’ v \dx)= [ I A (x- Y°)P (r, °' u \dx) 

J D J D 

= P^°^\A), 


since, by definition, y^ — y'' 0 is equal to y y "°. The conclusion follows by the 
definition of the Le Cam distance. □ 
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